{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# RAG With llama-index  + Milvus + LLama\n",
    "\n",
    "References\n",
    "- https://docs.llamaindex.ai/en/stable/examples/vector_stores/MilvusIndexDemo/\n",
    "- https://docs.llamaindex.ai/en/stable/api_reference/storage/vector_store/milvus/?h=milvusvectorstore#llama_index.vector_stores.milvus.MilvusVectorStore"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step-1: Configuration"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from my_config import MY_CONFIG\n",
    "\n",
    "MY_CONFIG.DB_URI = './rag_2_llamaindex.db'\n",
    "MY_CONFIG.COLLECTION_NAME = 'llamaindex_papers'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step-2: Setup Embeddings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# If connection to https://huggingface.co/ failed, uncomment the following path\n",
    "import os\n",
    "os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.embeddings.huggingface import HuggingFaceEmbedding\n",
    "from llama_index.core import Settings\n",
    "\n",
    "Settings.embed_model = HuggingFaceEmbedding(\n",
    "    model_name = MY_CONFIG.EMBEDDING_MODEL\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step-3: Connect to Milvus"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "✅ Connected Llama-index to Milvus instance:  ./rag_2_llamaindex.db\n"
     ]
    }
   ],
   "source": [
    "# connect to vector db\n",
    "from llama_index.core import VectorStoreIndex, StorageContext\n",
    "from llama_index.vector_stores.milvus import MilvusVectorStore\n",
    "\n",
    "vector_store = MilvusVectorStore(\n",
    "    uri = MY_CONFIG.DB_URI ,\n",
    "    dim = MY_CONFIG.EMBEDDING_LENGTH , \n",
    "    collection_name = MY_CONFIG.COLLECTION_NAME,\n",
    "    overwrite=False  # so we load the index from db\n",
    ")\n",
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
    "\n",
    "print (\"✅ Connected Llama-index to Milvus instance: \", MY_CONFIG.DB_URI )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step-4: Load Document Index from DB"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "✅ Loaded index from vector db: ./rag_2_llamaindex.db\n",
      "CPU times: user 95 ms, sys: 10.4 ms, total: 105 ms\n",
      "Wall time: 104 ms\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "from llama_index.core import VectorStoreIndex\n",
    "\n",
    "index = VectorStoreIndex.from_vector_store(\n",
    "    vector_store=vector_store, storage_context=storage_context)\n",
    "\n",
    "print (\"✅ Loaded index from vector db:\", MY_CONFIG.DB_URI )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step-5: Setup LLM"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.llms.replicate import Replicate\n",
    "from llama_index.core import Settings\n",
    "\n",
    "llm = Replicate(\n",
    "    model= MY_CONFIG.LLM_MODEL,\n",
    "    temperature=0.1\n",
    ")\n",
    "\n",
    "Settings.llm = llm"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step-6: Query"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The Granite models were trained using a combination of code data and natural language datasets. The code data was collected for model training, while the natural language datasets included web documents (Stackexchange, CommonCrawl), mathematical web text (OpenWebMath, StackMathQA), academic text (Arxiv, Wikipedia), and instruction tuning datasets (FLAN, HelpSteer). These datasets were not deduplicated. Additionally, the datasets were scanned using ClamAV to identify and remove instances of malware in the source code. Personal Identifiable Information (PII) such as keys, email addresses, names, user names, and passwords were redacted and replaced with corresponding tokens. IP addresses were also changed with synthetically generated IP addresses.\n"
     ]
    }
   ],
   "source": [
    "query_engine = index.as_query_engine()\n",
    "res = query_engine.query(\"What was the training data used to train Granite models?\")\n",
    "print(res)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The attention mechanism is a concept used in machine learning, particularly in models like transformers. It allows the model to focus on different parts of the input data when producing an output, mimicking the cognitive process of attention in humans.\n",
      "\n",
      "In the context of natural language processing, the attention mechanism helps the model understand the context of words in a sentence by weighing their importance relative to each other. This is done by computing a set of attention scores, which determine the weight or importance of each word in the input sequence when generating the output.\n",
      "\n",
      "The attention mechanism can follow long-distance dependencies, as demonstrated in the example provided. In this case, many attention heads in layer 5 of the encoder self-attention attend to a distant dependency of the verb 'making', completing the phrase 'making...more difficult'.\n",
      "\n",
      "There are different types of attention mechanisms, such as Scaled Dot-Product Attention and Multi-Head Attention. Scaled Dot-Product Attention computes the dot products of the query with all keys, divides each by the square root of the key dimension, and applies a softmax function to obtain the weights on the values. Multi-Head Attention, on the other hand, performs the attention function in parallel on different projected versions of queries, keys, and values.\n",
      "\n",
      "The attention mechanism is crucial for models to understand the context and dependencies in sequences, making it particularly useful in tasks like machine translation, text summarization, and question answering.\n"
     ]
    }
   ],
   "source": [
    "query_engine = index.as_query_engine()\n",
    "res = query_engine.query(\"What is attention mechanism?\")\n",
    "print(res)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The provided context does not contain information about the moon landing. Therefore, I cannot provide an answer to your query. The text discusses IBM Granite Code Models, a family of open foundation models for code intelligence.\n"
     ]
    }
   ],
   "source": [
    "query_engine = index.as_query_engine()\n",
    "res = query_engine.query(\"When was the moon landing?\")\n",
    "print(res)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "dpk-4-rag-pdf-r1.1.0-py3.11",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
